Forensic Science International: Genetics
○ Elsevier BV
Preprints posted in the last 30 days, ranked by how well they match Forensic Science International: Genetics's content profile, based on 24 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Akane, O.; Kawaguchi, Y. W.; Niwa, T.; Uno, Y.; Kuraku, S.
Show abstract
The effective management of threatened shark populations relies on accurate demographic data, particularly operational sex ratios. While sex identification in intact shark bodies is straightforward through the presence of external male organs, namely claspers, it remains impossible for processed fins in the illegal wildlife trade, early-stage embryos in breeding programs, or archived tissue fragments and blood samples where morphological traits are lost. Here, we present a robust molecular sexing framework leveraging recently identified sequences from shark sex chromosomes, consistently organized in the XY system, to our current knowledge. Our approach consists of two distinct methodologies tailored to the the current identification status of sex chromosome sequences in the target species. For the whale shark Rhincodon typus and the brownbanded bamboo shark Chiloscyllium punctatum, we employed end-point PCR assays targeting male-specific Y-linked markers. For the cloudy catshark Scyliorhinus torazame, we developed a quantitative PCR (qPCR) assay targeting differential X chromosome dosage. In this dosage-based system, females (XX) are distinguished by an amplification profile approximately one cycle earlier than males (XY). By integrating X-linked dosage quantification, our framework provides a critical internal control that significantly enhances reliability, allowing researchers to distinguish true females from PCR failures. This toolkit offers a versatile solution for diverse applications, ranging from the study of sex determination mechanisms in pre-phenotypic embryos to the reconstruction of sex ratios from space-constrained tissue archives and global wildlife forensics, thereby contributing to the comprehensive conservation of shark biodiversity.
Filipovic-Sadic, S.; Parker, C. A.; Mihailovic, M. K.; Milligan, J. N.; Turner, J. M.; Borel, S. L.; Le, V.; Markulin, T.; Janovsky, J. W.; Killinger, B. J.; Deshotel, M. J.; Reading, N. S.; Fredrickson, E. K.; Ji, Y.; Close, D.; Wright, J.; Williams, M.; Barrie, E. S.; Martin, K. E.; Gray, S. M.; Haynes, B. C.; Hall, B.
Show abstract
PurposeCarrier screening for hereditary conditions is challenged by genes with complex genomic architecture, where short-read sequencing can fail to detect clinically relevant variants. This study evaluated a unified, amplification-based nanopore sequencing workflow across multiple laboratories for comprehensive analysis of such loci. MethodsA modular long-read sequencing assay was evaluated across five laboratories using targeted PCR enrichment, Oxford Nanopore sequencing, and automated variant analysis. The workflow interrogated genes associated with spinal muscular atrophy, thalassemia, cystic fibrosis, fragile X syndrome, congenital adrenal hyperplasia, Gaucher disease, and hemophilia A. Performance was assessed against orthogonal methods for single nucleotide variants (SNVs), indels, copy-number variants, repeat expansions, and structural rearrangements. ResultsAcross 882 unique samples (1,266 tests), overall agreement with comparator methods exceeded 96% for variant-level detection and 97% for genotype status classification. Long-read sequencing enabled phasing of paralogous loci, integrated sizing and interruption analysis for FMR1 repeats, and simultaneous detection of SNVs and structural variants in globin loci and CYP21A2-TNXB region, reducing reliance on multiple workflows. ConclusionThis multisite evaluation suggests that targeted long-read sequencing can consolidate complex variant detection into a single workflow, improving analytical completeness and operational efficiency for carrier screening.
Bougiouri, K.; Irving-Pease, E. K.; Frantz, L. A. F.; Racimo, F.; Petr, M.
Show abstract
Recent advances in genome imputation have enabled the application of state-of-the-art statistical methods--originally developed for present-day genomes--to ancient genomes. One class of such methods, known as local ancestry inference (LAI), can model an individuals genome as a mosaic of tracts assigned to different putative ancestral sources, revealing patterns of genetic ancestry across the genome. However, most LAI methods have been designed to study recent admixture events in human history, and they generally assume large panels of present-day genomes. Despite the recent availability of high-quality imputed ancient genomes, it remains unknown to what degree LAI inference is reliable for such datasets. Ancient DNA is often characterized by heterogeneous geographic and temporal sampling, varying degrees of divergence between ancient source proxies and admixing populations, and complex demographic histories. Here, we performed an extensive set of population genetic simulations to evaluate the accuracy of four popular LAI methods-RFMix, FLARE, MOSAIC and simpLAI-under different demographic scenarios, various temporal sampling schemes, sample sizes, and admixture dates. We quantify the accuracy of these methods as a function of different parameters in practically relevant scenarios, and provide general guidelines for future studies utilizing LAI in ancient DNA research.
Rodriguez, L. K.; Schallhart, S.; Hobmeier, P.; Curran, T.; Perez-Jorge, S.; Prieto, R.; Oliveira, C.; Silva, M. A.; Thalinger, B.
Show abstract
O_LIEnvironmental DNA (eDNA) analyses have become a powerful tool for non-invasive biodiversity monitoring, yet the applicability of population genetic approaches to environmental samples remains largely unexplored. Even when genetic traces originate from a single individual, low target DNA concentrations and amplification or sequencing artefacts can compromise downstream genetic inferences. Here, we present a novel approach for obtaining demographic insights and lineage-level mitogenomic information from aquatic eDNA samples collected near vertebrate individuals. C_LIO_LIPaired eDNA and tissue samples were collected during sperm whale (Physeter macrocephalus) encounters in the Azores. Samples were screened for the presence of vertebrate eDNA and analyzed with a novel molecular sex identification assay. Additionally, long-range PCR was used to amplify up to five mitochondrial DNA fragments ([~]3-4k bp) before subsequent sequencing on an Oxford Nanopore Technologies platform. A stringent three-tier filtering framework capable of identifying true mitogenomic variation across eDNA samples was developed for maximum recovery of genetic diversity at the haplogroup level. By benchmarking eDNA samples via their paired tissues, parameter values were optimized to maximize concordance and minimize spurious variant calls. C_LIO_LISexing was successful for 50% of eDNA samples, with 96% concordance to paired tissues, and marine vertebrate DNA concentration significantly predicted sexing success. Further, Medaka polishing produced high identity mitochondrial consensus sequences (>16 kb) from eDNA samples. Across filtering regimes in the framework, curated SNP panels comprising up to 453 high-confidence mitochondrial SNPs resolved 19 haplogroups, with 93% concordance between eDNA and tissue samples. An intermediate bioinformatics filtering strategy maximized biologically accurate haplogroup recovery while minimizing sequencing artefacts, providing the most reliable lineage-level inferences. C_LIO_LIThis integrative approach demonstrates that targeted nuclear assays combined with long-range mitochondrial sequencing can recover individual-level genetic information from aquatic eDNA. By defining analytical thresholds governing success, the framework advances non-invasive genetic monitoring of populations via eDNA and enables population-level monitoring and conservation of endangered and genetically-vulnerable species. C_LI
Bravington, M. V.; Baylis, S. M.; Eveson, P.; Feutry, P.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWClose-Kin Mark-Recapture (CKMR) is a statistical framework for estimating demographic parameters of wild populations. Instead of recapturing individuals, it relies on the identification of closely-related pairs such as parents and offspring, or siblings. By measuring how often such close-kin are "recaptured" among sampled animals (whether alive or dead), scientists can estimate demographic parameters such as census size, mortality rates, and connectivity. CKMR is starting to change fisheries and wildlife management by giving more reliable demographic information, even for many species that resist conventional approaches. Here we introduce the kinference R package, which provides a set of tools for finding close-kin pairs among thousands of samples each genotyped at thousands of SNPs, and for associated quality control. The CKMR context implies different requirements and assumptions to many other kinship programs. In particular, kinference accounts empirically for linkage without requiring a genome assembly, is able to estimate and control false-negative and false-positive probabilities, and can cope with null alleles. The package has been developed and used in numerous CKMR projects since 2017. This paper documents the assumptions, statistical algorithms, and intended workflow for kinference.
Kaur, R.; Dewan, C.; Chauhan, I.; Sharma, K.; Sharma, S.
Show abstract
Assessing reproducibility across different molecular profiling studies is a persistent methodological challenge (Zhang et al., 2009; Sweeney et al., 2017; Ioannidis, 2005). Differences in platform technology, cohort composition, analytical pipelines, and feature definitions often make it difficult to interpret cross-study comparisons based solely on gene-identity overlap. In this study, we conducted a retrospective computational analysis of seven publicly available analytical datasets (including alternative analytical pipelines applied to the same cohort) derived from five biologically independent peripheral blood transcriptomic and DNA methylation cohorts, comprising 3,487 samples (1,824 Parkinsons disease cases and 1,663 controls). Reproducibility was evaluated using gene-identity overlap, enrichment-based comparisons, and a permutation-based framework to assess directional consistency of effect estimates across datasets. We also tested the robustness of results by varying false discovery rate thresholds and applying alternative probe-to-gene collapsing strategies. All analyses were performed using reproducible workflows implemented in R and Python with fixed random seeds. Across independent cohorts, gene-identity overlap was generally limited, with enrichment ratios close to one, especially when datasets were generated using different platforms. In several datasets, limited numbers of statistically significant features further constrained overlap-based comparisons. In contrast, directional consistency showed greater stability. High levels of directional consistency were observed across independent cohort comparisons when restricted to overlapping statistically significant features and remained stable across statistical thresholds (90.0% at FDR < 0.05 and 82.8% at FDR < 0.10). When evaluated across the full shared gene universe without conditioning on statistical significance, directional consistency was substantially lower ([~]30 to 32%) but remained significantly above permutation-based null expectations. Permutation testing confirmed that the observed directional consistency exceeded what would be expected by chance. A combined analysis including methodological replicates (n [≥] 3 datasets) showed 98.3% directional consistency; however, this estimate includes non-independent analytical pipelines applied to the same cohort and reflects analytical stability rather than independent biological replication. Rather than introducing a new statistical method, this study examines how commonly used reproducibility metrics behave under crossstudy heterogeneity and identifies their practical limitations and appropriate use boundaries.
Zhang, N.; Li, L.; Ta, K.; Shi, C.; Seim, I.; Zhang, Y.; Zhang, W.; Cui, Z.; Xiang, X.; Jia, L.; Ge, Q.; Du, M.; Xie, T.; Ji, Q.; Yue, Z.; Fan, G.; Liu, S.; Meng, L.
Show abstract
Deep-sea corals are vital in maintaining coral ecosystem biodiversity, yet their genetic characteristics remain largely unexplored. Here, we present 11 deep-sea coral genome assemblies, including four Hexacorallia and seven Octocorallia species, significantly contributing new genomic information across two orders. Our analysis reveals the historical dynamics of coral speciation and the influence of environmental factors on the evolution of coral reef ecosystems.Total of 126 horizontal gene transfer (HGT) events were detected, among which genes from the ancestor of symbiodiniaceae indicate that the ancestors of deep-sea corals may have inhabited shallow-sea environments. Notably, several of these HGTs are involved in phosphorus (PhnX/PhnW) and cholesterol (DHCR7) metabolisms within corals, indicating that HGTs may serve as an adaptive survival strategy for the coral holobionts. Deep-sea corals also rely on symbiotic bacteria to synthesize 10 essential amino acids (such as valine and tyrosine), retaining only partial amino acid synthesis capacity. In addition, we investigated the evolution of key biological rhythm genes and temperature adaptation in corals. The loss of key rhythm genes (e.g., clock and cry) in deep-sea corals and copy number difference of genes related to heat stress (e.g., Cbl-b and Rchy) revealed genetic difference between deep-sea and shallow-sea corals. Our new genome assemblies enhance the understanding of deep-sea coral evolution, biodiversity, and adaptation, providing a genetic foundation for coral conservation.
Gao, Y.; Wang, W.; Liu, Y.; Wu, J.; Wang, L.; Wei, J.; Dai, M.; Wei, C.; Tian, L.; Jiang, C.; Su, J.; Xue, H.; Liu, H.; Ni, J.; Jiang, S.; Cai, D.; Zheng, X.; Zhang, D.; Bai, S.
Show abstract
Climate change poses an increasing threat to the cultivation of deciduous fruit trees, placing greater demands on modern pear breeding. Using pear germplasm adapted to diverse environments, we assembled 11 chromosome-level genomes. In combination with 13 publicly accessible pear genomes, we analyzed presence-absence variations (PAVs) and constructed a graph-based pangenome for pear. By performing a PAV-eQTL analysis of the fruit of 123 pear accessions, we identified PAVs significantly associated with expression levels of genes that may be involved in regulating agronomic traits. Population analysis of 268 pear accessions revealed two stop-gained variants in DAM1 of independent origin, which may function in advancing the blooming date and reducing the chilling requirement. We detected complex PAVs at the NOR1 locus, including two copy-number variations and one deletion. These PAVs contributed to the rapid diversification of the NOR1 locus and the fruit development period through regulating ARF5 and other ripening-related genes. We revealed the selection history of the NOR1 locus and developed novel pear individuals that accumulated alleles for low chilling requirement, early blooming date, and short fruit development period. The results provide valuable resources for pear genomics research and offer a guideline for breeding modern pears with climate resilience.
Johnson, E.; Jin, C.; Guinet, B.; Alumbaugh, J.; Martin, N. L.
Show abstract
The application of metagenomics in ancient DNA (aDNA) research is rapidly expanding, driven in particular by advances in sedimentary aDNA research and sequencing technologies. Although many ancient DNA studies rely on broadly similar bioinformatic strategies, there is still no single standardized, widely adopted workflow. These differences can directly affect how efficiently past biodiversity can be reconstructed and authenticated from the various archives analyzed using ancient metagenomic approaches. Although a few pipelines tackle the processing of ancient DNA data from shotgun sequencing, the ones applied to metagenomic datasets are scarce and often resource-intensive or challenging to install, update, or extend with new tools and parameters. metaJAM, a scalable and user-friendly pipeline, is presented here to specifically address the challenges of metagenomic aDNA analyses of eukaryotes. The pipeline has been designed in Nextflow to ensure continuous development and can be used on different high-performance computing (HPC) clusters. metaJAM integrates all key steps required for ancient DNA metagenomic analyses, from raw sequencing data pre-processing to microbial filtering, taxonomic assignment via competitive iterative mapping against Bowtie 2 reference indexes and reassignment using lowest common ancestor (LCA) inference. Validation and authentication are performed using the post-LCA toolkit bamdam together with alignment to an exhaustive reference database using MMseqs2. It allows users to choose among alternative tools and generates a series of plots to support data visualization and taxon authentication. metaJAM differs from existing pipelines through its implementation of rigorous filtering of microbial-like reads by Kraken 2 classification and masking microbial-like regions, iterative or parallel Bowtie 2 mapping, validation of the detected taxa and integration of up-to-date tools for ancient metagenomic analysis, along with diagnostic plots that help users assess the reliability of taxonomic assignments and visualize their data. It complies well with limited computational resources, customised databases for taxonomical groups, and provides an accessible workflow to support the investigation of metagenomic ancient DNA datasets. Its applications span a range of contexts, from ecosystem reconstructions in environmental aDNA archives such as sediments, to metagenomic studies on archaeological artefacts and even taxonomic identification of undiagnosed biological materials.
Sequeira, A. N.; Szpiech, Z. A.; Huber, C. D.
Show abstract
Identifying signatures of positive selection in humans is complicated by demographic processes such as bottlenecks, migration and admixture, all of which can distort or obscure the genomic patterns produced by selective sweeps. Ancient DNA offers a direct window into past allele and haplotype frequencies, yet most sweep scans in ancient populations rely on allele-frequency or site frequency spectrum (SFS) summaries, with limited use of haplotype-based approaches. Here, we evaluate the performance of haplotype and SFS-based methods for detecting selective sweeps under demographic scenarios that reflect the complex history of ancient and modern Europeans. We extend the haplotype-based likelihood framework saltiLASSI to accommodate pseudohaploid ancient genomes, enabling the use of truncated haplotype frequency spectra and their spatial decay to detect sweeps without requiring phased data. Using forward-in-time simulations, we examine sweeps of varying ages, two pulses of admixture with different source proportions, and cases where selection continues or ceases after admixture. We compare saltiLASSI to a widely used SFS-based approach (SweepFinder2). Our results show that haplotype-based likelihood models retain higher power than SFS methods in admixed populations, particularly when sweep haplotypes are introduced through migration or when selection has not had sufficient time to regenerate a clear SFS signature after admixture. These findings highlight the promise of haplotype-based inference for ancient DNA and demonstrate how model-based approaches can improve the detection of historical selective sweeps in populations with complex demographic histories.
Wickramasinghe, N.; Choudhary, P.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWImbalances in the human microbiome are associated with numerous diseases, highlighting the need for benchmarks that define healthy microbiome composition and identify abnormal deviations. Although the microbiome is increasingly studied as a potential clinical marker, statistical approaches for constructing reference regions of healthy microbiome composition remain relatively underexplored. This work develops statistical methods to construct reference regions for healthy microbiome data, addressing three main challenges. First, since microbiome data contain relative rather than absolute information, standard statistical methods are not directly appropriate. Therefore, microbiome profiles are treated as compositional data satisfying a sum constraint, and log-ratio transformations are used to analyze them in real space while preserving their relative structure. Second, reference regions are constructed as tolerance regions rather than confidence regions, so that they cover a pre-specified proportion of the healthy population with a given confidence level. The proposed framework incorporates both parametric and nonparametric approaches for constructing these tolerance regions. Parametric methods are considered when the ilr-transformed data approximately follow an elliptical distribution, where they can yield smaller regions while maintaining the desired coverage. Nonparametric approaches provide a flexible alternative by avoiding distributional assumptions. Third, because microbiome data are multidimensional and difficult to interpret, quantitative and graphical tools are introduced to assess atypicality and identify which microbial taxa contribute most to deviations from healthy profiles. Simulation studies are conducted to evaluate the performance of the proposed methods. The methodology is then demonstrated by constructing reference regions for healthy microbiome profiles using real-world data. Finally, the approach is applied to microbiome datasets comparing healthy and patient profiles to assess whether patient samples are identified as atypical and to examine which taxa contribute to these deviations. Overall, the proposed framework provides a clear and statistically robust approach for defining healthy microbiome reference regions and detecting atypical microbiome profiles.
Kambakam, S.; Thomas, J.; Stuber, T.; Wu, P.; Robbe-Austerman, S.; Palinski, R.
Show abstract
African swine fever virus (ASFV), the etiologic agent of African Swine Fever (ASF), is a high-consequence pathogen requiring experiments to be conducted in containment in non-endemic countries, thereby restricting diagnostic development, the creation of reference standards, and proficiency testing (PT). Safe and reliable inactivation methods are essential to expand diagnostic capacity while preserving nucleic acid integrity for molecular assays in unaffected countries. This study employed gamma irradiation to achieve complete inactivation of ASFV without compromising downstream molecular detection, as gamma irradiation offers deep penetration and uniform dose delivery. ASFV-cell culture supernatants were subjected to gamma irradiation doses ranging from 2 to 50 kGy. Viral replication was evaluated using TCID{square}{square} and serial passages, revealing a consistent dose{square}dependent reduction in infectivity across increasing irradiation dose levels and a complete loss of ASFV infectivity at 30 and 50 kGy. Molecular detection remained unaffected at all of the tested doses as confirmed by qPCR Ct values and sequence identity of the p72 gene. Whole genome sequencing demonstrated >99% genome coverage and consistent read depth profiles across irradiated and non-irradiated samples, indicating preservation of genomic integrity at all tested doses. These findings demonstrate that gamma irradiation at 50 kGy fully inactivates ASFV-cell supernatants while maintaining nucleic acid quality suitable for molecular diagnostics. The resulting inactivated material meets quality assurance requirements for molecular reference standards and PT panels and can be safely distributed to laboratories outside high containment facilities, supporting broader diagnostic readiness and harmonization of ASFV testing.
Pham, B. K.; Davenport, S.; Azriel, D.; Schwartzman, A.
Show abstract
LD Score Regression (LDSC) is a prominent method, which estimates whole-genome SNP heritability from summary statistics via the slope of a linear regression of GWAS test statistics corresponding to a trait of interest against LD scores. It was claimed by the LDSC authors that the free intercept in the regression accounts for confounding bias such as population stratification. In this study, we argue that the intercept in LDSC must be fixed to 1 for accurate SNP heritability estimation. We show both theoretically and with simulations that the estimated intercept does not accurately capture population stratification effects, and that it adversely affects the accuracy of the heritability estimate introducing bias and increasing variance. Fixing the intercept to 1 eliminates bias and reduces variance when no population stratification is present. On the other hand, under population stratification, LDSC is biased with both the free and the fixed intercept. Additionally, we show that estimated standard errors in LDSC are underestimated, potentially leading to false-positives in downstream GWAS analyses.
Rakotoarivony, R.; Carter, E. J.; Racimo, F.; Regnier, D.; Ranaivoarisoa, J. F.; Shriver, M.; Perry, G.; Manica, A.; Hodgson, J. A.
Show abstract
The population of Madagascar exhibits a globally unique combination of African and Asian genetic ancestries. Previous studies have described the admixture history of Madagascar at island-wide scales [1,2], but less focus has been paid to fine-scale population structure across the island. We present new genome-wide genetic data from 192 individuals sampled across five regions of Madagascar. We identify population structure at extremely fine spatial scales ([~]10 km) among the Merina of the central highlands. By analysing subpopulations separately, we found one Merina group exhibited similarity to coastal populations in f4 ratios, estimated admixture dates, and pairwise FST distances, while another group was similar to other highland individuals in the same measures. This fine-scale substructure is likely associated with historical coastal-to-highland migration during the 18th and 19th centuries. In contrast, we also observe macro-scale structure in estimated timing of admixture across the island, with southeastern coastal groups exhibiting the earliest estimated admixture timings, and northern groups exhibiting the latest. This pattern corroborates previous results [1,2], and may suggest differing histories of admixture timing among Malagasy populations. Our results emphasise the importance of deep micro-geographic sampling to complement macro-scale analysis when characterising demographic history.
Zhang, L.; Paterson, A. D.; Sun, L.
Show abstract
Testing for Hardy-Weinberg equilibrium (HWE) is a fundamental component of genetic data analysis, widely used for quality control and model validation. Although HWE testing is well established for autosomal loci, inference on the X chromosome is more complex due to sex-specific genotype structures and potential sex differences in minor allele frequency (sdMAF). Existing tests differ in their assumptions about sdMAF and male sample inclusion, often leading to distinct but poorly characterized null hypotheses. We develop a general statistical framework for HWE inference using the robust allele-based regression model. By formulating HWE testing as an assessment of allele-level dependence, the framework directly parameterizes Hardy-Weinberg disequilibrium, unifies existing Pearson{chi} 2-based tests under explicit modeling assumptions, and clarifies their null hypotheses, degrees of freedom, and sensitivity to sdMAF. The framework also accommodates covariate and population-structure adjustment within a unified regression-based formulation. The proposed framework provides robust, interpretable, and flexible inference, establishing a unified statistical foundation for HWE testing across autosomal and X-chromosomal regions. Simulation studies and analysis of high-coverage 1000 Genomes Project data demonstrate that commonly used X-chromosome tests can exhibit inflated type I error or misleading inference when sdMAF is present.
Yang, Q.; Li, L.; Ma, Q.; Yin, R.
Show abstract
BackgroundDNA lesions arise from endogenous metabolism and environmental exposure and are the major drivers of mutagenesis, aging, and cancer development. However, mapping DNA damage at nucleotide resolution remains a technically challenging task. Nanopore sequencing enables direct detection of chemical perturbations through alterations in ionic current signals. Despite this potential, existing computational approaches remain limited in their capacity to generalize across diverse lesion types and to effectively integrate nucleotide sequence context with raw signal information for accurate detection and localization. ResultsWe presented DamageFormer, a multimodal deep learning framework for detection and localization of DNA lesions using native nanopore sequencing data. Central to this framework is LesionBERT, a damage-aware genomic foundation model built upon DNABERT-2 and enhanced with lesion-focused reconstruction objectives to improve representation of chemically modified bases. DamageFormer integrated LesionBERT with a neural signal model through an adaptive gating mechanism, enabling dynamic weighting of sequence context and nanopore signal evidence. The model was trained using a joint objective that combines prediction, localization, and contrastive alignment losses to promote cross-modal coherence and spatial precision. On an oxidative DNA damage benchmark comprising paired sequence and signal data, DamageFormer achieved an AUROC of 0.99997 for lesion detection and a mean absolute localization error of 0.00439, consistently outperforming state-of-the-art baselines. Model interpretation analyses revealed context-dependent modality weighting that adapts to variation in signal quality and sequence ambiguity. The proposed framework further generalizes to chemically distinct guanine lesions not observed during the training process, demonstrating its robustness and transferability to unseen damage types. ConclusionsDamage-aware biological language modeling combined with adaptive multimodal fusion enables accurate and interpretable identification of DNA lesions from nanopore sequencing data. This framework provides a scalable approach for characterizing genome-wide damage landscapes and illustrates how chemical DNA information can be systematically incorporated into genomic language models. The source code and pretrained models of this work are available at: https://github.com/UF-HOBIYin-Lab/DamageFormer.
Zelter, A.; Riffle, M.; Merrihew, G. E.; Mutawe, B.; Shulman, N.; Sanders, J. A.; Noble, W. S.; Johnson Erickson, D. P.; Morimoto, A.; Shaver, B. A.; Steins, T. N.; Cao, N.; Ford, E. C.; Rudnick, P. A.; Chelsky, D.; Wan, K. H.; Inman, J. L.; Chang, H.; Snijders, A. M.; Mao, J.-H.; Celniker, S. E.; De Chant, J.; Obst-Huebl, L.; Nakamura, K.; Wu, C. C.; MacCoss, M. J.
Show abstract
Ionizing radiation induces molecular responses that may be used to estimate exposure when physical dosimeters are unavailable. Here we present two large-scale proteomics datasets generated from mouse dorsal skin punch samples collected following controlled X-ray exposures spanning multiple doses, dose rates, and post-exposure time points. Experiment 1 comprised 96 samples (including 16 reference samples) collected 6 days after exposure to 0-75 cGy delivered at either 30 or 300 cGy/min. Experiment 2 comprised 936 samples (including 236 reference samples) exposed to 0-100 cGy at either 3 or 28 cGy/min dose rates and harvested between 7 and 150 days post-exposure. All samples were processed using a standardized workflow involving automated bead-based digestion and data-independent acquisition mass spectrometry. The datasets include multiple pooled reference sample types, process controls, and system suitability standards ensuring high quality data. All data presented are available via ProteomeXchange at several levels of processing, from raw files through normalized peptide- and protein-level abundance matrices suitable for biomarker discovery and machine learning applications. This dataset will facilitate generation of new insights into the biological changes and molecular signatures resulting from X-ray exposure in mice and may also help inform future studies in humans.
Engman, V.; Lamon, S.; Mason, S.
Show abstract
1Sex steroid hormones are not exclusively localised in the circulation and can be found in numerous extragonadal tissues, in concentrations unrelated to the circulating fraction. Existing methodology to measure intramuscular steroid hormone concentrations includes both immune-based assays and liquid chromatography-mass spectrometry (LC-MS), the gold standard for hormone measurements. To date, no LC-MS based methods validation has been published on the measurement of intramuscular sex steroid hormones, despite clear biological relevance. Here, we describe the development and validation of a simple, high-throughput LC-MS Orbitrap method for the measurement of 10 intramuscular sex steroid hormones, including pregnenolone, progesterone, dehydroepiandrosterone, androstenedione, testosterone, epitestosterone, dihydrotestosterone, oestrone, oestradiol, and oestriol. In brief, isotope labelled standards were added to 5-6 milligrams of lyophilised muscle tissue, homogenised and extracted with ethyl acetate. The extracts were dried down and sequentially derivatised with 1-methylimidazole-2-sulfonyl chloride and hydroxylamine hydrochloride to target both the phenolic hydroxyl groups and ketone groups. The limit of detection was 1.0 {+/-} 1.0 pg/mg (range 0.36 - 3.26 pg/mg), with a R2 > 0.99 for all analytes. Matrix effects were 90-110% for all analytes except for dihydrotestosterone (143.6%), and precision was <10 CV% for all analytes in the presence of a muscle matrix. Our method allows for 20-40 samples to be prepared in [~]4 h, with a sample data acquisition time of 13 minutes. Moreover, our method provides the opportunity for specific analysis of steroid hormone concentrations in skeletal muscle, allowing target tissue specificity instead of relying on proxy measures from the circulation.
Shakeri, F.; Mehdian, H.; Bakhtiyari-Ramezani, M.; Amini, E.; Hajisharifi, K.
Show abstract
Staphylococcus aureus (S. aureus) is the most common pathogen associated with skin infections worldwide. Significant efforts have been made to identify and develop innovative therapeutic strategies against S. aureus as alternatives to conventional antibiotics. Physical plasma has a broad range of potential uses, with non-destructive disinfection being one of its earliest applications. Although the literature emphasizes the antibacterial properties of cold atmospheric plasma (CAP), the effect of plasma on S. aureus on damaged skin susceptible to S. aureus invasion through the itch-scratch cycle has not been studied to date. Thus, we examined the effectiveness of CAP treatment on S. aureus bacteria in atopic dermatitis lesions using floating electrode dielectric barrier discharge devices, as well as helium and argon plasma jets. Heat distribution on the skin target, ultraviolet C radiation, and ozone generation of plasma jets for the operator of plasma sources were evaluated. Microbial tests confirmed the presence of S. aureus on the lesions of the groups before treatment. The groups exposed to plasma treatment showed a notable reduction in bacterial population compared to the model group (p<0.05). Furthermore, our investigation indicated that plasma treatment reduced pruritus behavior. The findings suggest that cold atmospheric plasma treatment may potentially target skin infections caused by S. aureus in addition to conventional therapies.
Garibian, P.; Rubleva, V.; Burlakov, A.; Valeyev, V.; Kasatkina, A.; Kirova, V.
Show abstract
Intraspecific morphological variability presents a complex challenge for biological systematics and biomonitoring, particularly for organisms with high phenotypic plasticity, such as zooplankton. Morphological differences between individuals of the water flea species Bosmina longirostris (Crustacea: Cladocera) are difficult to distinguish visually, parthenogenetic females look morphologically uniform within the species; nevertheless, they demonstrate differences attributable to their geographic origin and developmental stage. A reference dataset of microscopic images was created for the study, including populations from two geographically separated regions (seven ones from European Russia and seven ones from Sakhalin Island in the Pacific Ocean (Far East of Russia) and two age groups, demonstrating the ability of a neural network classify to successfully the intraspecific morphological variation. This study demonstrates that deep learning methods are prospective for the detection and understanding of fine morphological intraspecific differences in the cladocerans.